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I demonstrate that, rather unexpectedly, there exist noisy 
quantum channels for which the optimal classical information 
transmission rate is achieved only by signaling alphabets con- 
sisting of nonorthogonal quantum states. 
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Within the framework of classical information theory, 
there is a tacit but basic assumption that a communi- 
cation channel's possible inputs correspond to a set of 
mutually exclusive properties for the information carri- 
ers. In the brief instant after a signal leaves the sender's 
hand, but before it enters a noisy channel, an indepen- 
dent observer or wire tap should be able — in principle, 
at least — to read out the signal with complete reliability. 
Anything less than complete reliability in this readout 
represents an extra source of noise over and above that 
which is supplied by the channel. This is a situation that 
both the sender and receiver work to avoid. 

When quantum systems are used as information car- 
riers, one's natural inclination is that the same basic as- 
sumption should hold. For instance, one might think 
that encoding distinct signals in nonorthogonal quantum 
states must be less than optimal for information transfer. 
This is because the readout possibilities for times inter- 
mediate to the signal's generation and its entrance into 
the channel are excluded automatically: it is a matter of 
physical law that nonorthogonal quantum states cannot 
be distinguished with perfect reliability 0| and any at- 
tempt to do so (even imperfectly) imparts a disturbance 
to them ||] . These are the principles that encourage the 
use of nonorthogonal signals for cryptographic purposes 
1^; however, just because of this, one would not expect 
them to play a role in questions to do with reliable, public 
communication. 

In what follows, I present an example that dispels this 
prejudice: signals encoded in nonorthogonal quantum 
states are sometimes required to achieve the highest in- 
formation transfer rate that a channel can yield. In par- 
ticular, I present a noisy quantum mechanical channel for 
which the channel capacity expression recently derived by 
Holevo 1^ and Schumacher and Westmoreland ||] is only 
achieved by signals consisting of nonorthogonal states. 

In order to state the result, I first review the stan- 
dard notion of a quantum discrete memoryless channel 
(QDMC). For such a channel, the information carriers 
are quantum systems with a finite dimensional Hilbert 
space T-Ld, d denotes the dimension. The action of the 



channel is assumed to be due to interactions between 
the carrier and an independent environment outside the 
sender's and receiver's control. Thus, the channel's ac- 
tion on the carrier's quantum state p — most generally, a 
density operator — can be represented as an evolution of 
the form p $(p) = trg (C/(p (8) t)J7^) , where r denotes 
the standard state of the environment, U is some unitary 
operator, and tr^ denotes a partial trace over the envi- 
ronmental degrees of freedom. A convenient theorem of 
Kraus Q is that a mapping $ holds the form above if 
and only if it can also be represented as 



(1) 



for some set of (possibly nonhermitian) operators Ai sat- 
isfying — 1' (1 = the identity operator). The 
channel is memoryless when the evolution for arbitrary 
states a (including entangled ones) on Tif"^ is 
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for each n. That is to say, the noise acts independently 
on each information carrier sent down the channel. 

Let us now consider using a QDMC for the pur- 
pose of transmitting classical information. What we 
imagine here is a sender encoding various messages u, 
u = 1,...,A^, into an equal number of pure state 
preparations (i.e., one-dimensional projectors) T\u on 
Ti®". Along the way to the intended receiver, the states 
evolve according to the rule above, generally emerging 
as mixed states p„ — <I>®"(nt,). Finally, the receiver 
performs some measurement — mathematically, a positive 
operator-valued measure (POVM) ||] — {£"«}, with one 
outcome for each message u. The game here is that the 
measurement outcome is used to represent the receiver's 
best guess of the quantum state pu — and consequently 
the message u — appearing at the output of the channel. 

Note that the formulation so far is completely general 
in its usage of the QDMC. In particular, the quantum 
states used to encode the messages may be massively 
entangled across the n transmissions [Q. Moreover, the 
POVM {Eu] may be a collective quantum measurement 
over the whole Hilbert space Tif"", and need not factorize 
into measurements on the individual carriers For 
the considerations here, however, I restrict attention to 
senders using encodings based on a finite alphabet. A 
sender is said to make use of a finite alphabet when his 
signals are restricted to be product states on all of 
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which are drawn from some fixed finite set X = {Tie}, (■ = 
1, . . . , m, of pure states on Tid- That is to say, the sender 
is now imagined to encode messages u= (^i, . . . , into 
quantum states of the form n„ = ® ■ ■ ■ ® XI^^ . Such 
an encoding, taken as a whole, is called a code. 

With this, we can turn to the issue of reliable transmis- 
sion of information through the channel. A ( [2"^J , n, A„) 
code, < i? < 1, is a set of \ 2'^^\ codewords n„ (each of 
length n) such that the maximum probability of error in 
guessing a message is A„, i.e., A„ = max„ (l — ii:{puEu)) ■ 
The number R appearing in this definition is known as 
the rate of information transfer of the code. A rate 
R is said to be achievable if there exists a sequence of 
([2"-«J,n, A„) codes with A„ ^ as n — > oo. The ca- 
pacity C of the QDMC is the supremum of all achievable 
rates, where the supremum is taken explicitly over all 
alphabets used for coding, all codes making use of that 
alphabet, and all possible POVMs used for decoding at 
the receiver. Our main concern here is in finding the op- 
timal alphabet for the encoding, the issue being whether 
the optimal alphabet must consist of orthogonal states 
or not. 

A method of calculating the capacity has been known 
for some time when the POVM elements Eu are, like the 
codewords in this scenario, restricted to be tensor prod- 
uct operators on Tif"" This restriction is equivalent 
to saying that collective measurements on codewords are 
excluded from the game; each information carrier is mea- 
sured individually. The restricted capacity Ci is given 
by the supremum accessible information Ii{£) over 
all signal ensembles £ — {pf, 11^}, pi > 0, J2iPi — 1; i-^-i 
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where 



h{£) = sup \H{tY{pEb)) - y^p,H{iv{p,E^)) 



Pi — $(ni) are the output states, p ~ '^^PiPi, and 
H{tr{TEh)) = -J2b^^i'^^b)iogti{TEi,) is the Shannon 
entropy for the probability distribution tT{aEi,) derived 
from a POVM {Ei,}. (All logarithms throughout are cal- 
culated base 2.) 

Expression (|^) coincides with the standard classical ca- 
pacity theorem of Shannon pO[ | for a discrete memoryless 
channel: it is just that in the quantum case extra care 
must be taken to optimize both the input alphabet and 
the output observable — neither is given a priori. Note 
that the supremization in Eq. (||) is over all POVMs on 
Tid'. for this expression there is no restriction that the 
number of POVM elements be the same as the number 
of states in the alphabet X. However, convexity argu- 
ments can be used to show that Eqs. (||) and (^) are 
achievable by ensembles and POVMs each with no more 
than (f elements §W^. 



Recently, an elegant expression for the capacity C has 
been derived which dispenses with an explicit opti- 
mization over the receiver's measurement. The theorem 
is that 



C = supl{£) 

£ 



where 



/(£) = 5(p)-^p,5(p.) 



(4) 
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Pi and p are defined as above, and S{t) = — tr(rlogT) is 
the von Neumann entropy of a density operator r. Here 
again, convexity arguments (l^Jl^ give that the supre- 
mum can be achieved by signal ensembles consisting of 
no more than (P terms. 

It is important to note that, depending upon the chan- 
nel, C can be strictly greater than Ci. This is a re- 
sult of the fact that collective measurements generally 
afford more power for distinguishing product states than 
do product measurements ^,|lj,^. Moreover, this point 
is doubly significant for the task at hand because collec- 
tive measurements also appear to be the key for eliciting 
the optimality of nonorthogonal inputs. 

With the Theorems (||) and (^) for the capacities Ci 
and C in hand, the last remark can be made precise. The 
question is this. Do there exist channels for which Eq. (^) 
is achieved only by an ensemble of nonorthogonal states? 
I will answer this in the affirmative by explicitly con- 
structing an example of a channel on 7^2 that requires, 
at the very least, a nonorthogonal binary alphabet to 
achieve capacity. That is to say, I shall exhibit a particu- 

(2) lar <i> and a particular ensemble £p = {^, ^jHojHi} with 
tr(HoHi) ^ for which C > I{£p) > supg^ I{£±). The 
rightmost supremization in this is taken exclusively over 
ensembles of orthogonal states. 

(3) As stated earlier, this situation is somewhat surpris- 
ing. Indeed it can be shown for general $ and Hd that 
when the issue is of distinguishing two outputs in an opti- 
mal way — rather than optimizing information rate — and 
there are no restrictions on the inputs or the POVMs, 
then orthogonal inputs are always sufficient 0. More- 
over, when d = 2 and the input alphabet is binary, the 
capacity Ci is always achievable by an orthogonal alpha- 
bet: this will be demonstrated later in the paper. For 
the present, I turn to the particular example. 

The "splaying" channel acting on density operators of 
H2 is described simply enough by means of a Kraus rep- 
resentation as in Eq. Im). The Ai used to define it are 
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where, fixing an orthonormal basis {|a;), \x)} on 7^2, 
\y) = ^{\x) + \x)) , \y) = l={\a:)-\x)) , (6) 
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The action of this channel can be thought of in more 
graphic terms as follows. Let us make a switch to Bloch- 
sphere notation for all operators. The channel, person- 
ified as Eve, begins by performing the symmetric three- 
outcome "trine" POVM as the quantum states make 
their way from sender to receiver. I.e., the positive oper- 
ators in her POVM are given by 
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E, = -(1 + n, -cr), 



(8) 



where cr is the vector of Pauli matrices, rix — (1,0,0), 
and n± = (— 1/2, ±\/3/2, 0). The three vectors here are 
120 degrees apart and confined to the x-y plane; as must 
be the case for aU POVMs, + E+ + E_ = /. Upon 
receiving outcome i, Eve forwards a quantum state rji to 
Bob according to the rule 

77^ = i(l + a; • cr) and t]± = ^{1 ± y ■ er) , (9) 

where x = (1,0,0) and y ~ (0,1,0). The key idea is 
that if E,j. is detected, the state corresponding to the 
outcome is forwarded to the receiver; however, if E^ or 
E- are detected, orthogonal or "splayed" versions of the 
outcomes are sent. 

If the sender transmits a general pure state 



Tlap = ^(1 + • cr) 



(10) 



where Safs = (cos a sin /3, sin a sin /3, cos /3), for a € 
[0, 27r) and f3 G [0, tt), the upshot of Eve's interference — 
as far as the sender and receiver are concerned — is the 
evolution Ilap ^{Hajs) where 

$(n„^) = ^ tr(n„^S,) 77, = hl + t^p-cr) (11) 



and 



This 



tap = 5(1 + COS a sin /3, a/3 sin a sin , O). 
follows since tr(napEx) = (1 -f cosQ;sin/3)/3 and also 
tr(nai3E±) — (2 — cos a sin f3 ± VS sin a sin /?) /6. 

With Eq. ( |ll| ) , one can readily calculate Eq. (^ for an 
arbitrary ensemble of orthogonal input states. Suppose 
the state in Eq. (10) and one orthogonal to it (i.e., with 
Bloch vector —Sap) are sent through the channel with 
prior probabilities t and 1 — i, respectively. Calling the 
result of Eq. (||) I{a,P,t), this gives 



J(a,/3,t) = 

(1 + (2t- l)cosasin/3)^ + 3((2t- 1) sinasin/?)' 



^ 1 + cos a sin /3) ^ -1- 3 ( sin a sin /3) " 



-il-t)<P (1- 



cos a sm 



+ 3 ( sin a sin /?) ' 



(12) 



where (t){x) = -~h(y/x/3) and 

2 h{z) = {l + z) log(l + z) + (l-z) log(l - z) . (13) 



One can easily check that Eq. (12) is maximized when 
a ^ f3 = tt/2 and t = 1/2, yielding a value of 



C. 
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ortho 6^°^ 1^1024; 



0.268273 bits. 



(14) 



Now consider the following ensemble of inputs. Let 
IIq be a state given by Eq. ( |l0|) but with (3 = tt/2, and 
let IIq = n_Q. Assume each of these occurs with prior 
probability 1/2. Thus, the two signaling states in this 
ensemble are (generally) nonorthogonal, but restricted 
to the plane of the POVM elements and reflecting their 
symmetry. Again, one readily calculates Eq. (|^) to get 

1(a) = 0((1 -t-cosa)^) - (/)((1 +cosa)^ + 3 sin^ a) . 

The analytic maximization of this quantity depends upon 
the solution of a transcendental equation. Therefore, the 
maximization requires some numerical work: it turns out 
to be attained when a = 1.521808 ^ 7r/2, roughly 87.2 
degrees. The value of the maximum is 



Cn 



0.268932 bits. 



(15) 



This completes the demonstration that a QDMC's clas- 
sical information capacity need not be achievable by or- 
thogonal states. The difference in this particular example 
is not large, but it is enough to prove the principle. 

Heuristically, what is going on with the splaying chan- 
nel is that, from a "God's eye view," the output rjx acts 
like an erasure flag, signifying the disappearance of a bit. 
As the angle a is reduced, the probability of a flagged 
erasure increases, and so the information rate decreases. 
As a is made larger, the transmission probability for dis- 
tinguishable bits (i.e., ?7+ and r/_) increases, but there 
is an accompanying increased probability that a bit will 
have flipped. The angle a in Eq. ( |l5|) represents the op- 
timal tradeoff between these tensions, as quantified by 
the capacity formula for C in Eq. (^) — in other words, 
when the full power of collective quantum measurements 
is made available at the receiver. 

The last point appears to be crucial for understanding 
the origin of this effect. When each qubit is measured 
individually, the optimal tradeoff between the tensions is 
quantified by the capacity Ci given in Eq. (^). In that 
case, it should be noted that the erasure flag's contribu- 
tion to the tensions effectively disappears; with respect 
to individual measurements, the erasure flag always man- 
ifests itself as a probability for a bit flip error. This is 
seen easily with an example. If the ensemble {ncIlQ} 
(equal prior probabilities) is used, but no collective mea- 
surements, then it turns out that there is enough sym- 
metry in the problem that Eq. (^ can be calculated ex- 
plicitly. When two equiprobable states with equal-length 
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Bloch vectors a and b are to be distinguished, the opti- 
mal measurement is specified by the unit vectors parallel 
and anti-parallel to d ~ a — b, and the accessible infor- 
mation is given by /i(a, 6) = —(j)(9{a ■ d)/2) ||l],p^. For 
the case at hand, we obtain Ii{a) = 0(3 sin^ a), which is 
achieved by a measurement basis consisting of the projec- 
tors ?7+ and rj- . As far as this measurement is concerned, 
an erasure-flag output has equal probabilities for leading 
to correct and incorrect identifications by the receiver. 
In particular, /i(a) has a maximum of 0.255992 bits at 
a — 7r/2, i.e., for an ensemble of orthogonal input states. 

Indeed, it is a generic property of channels on H2 
with binary input alphabets that the maximum achiev- 
able rate with respect to individual measurements can 
be attained by an orthogonal alphabet. Furthermore, 
since a standard orthogonal projection-valued measure- 
ment always suffices for achieving capacity here P,p^, 
this remains true even without optimizing the ensemble 
prior probabilities or the measurement observable. This 
fact arises in the following manner. 

Suppose a fixed measurement is given by the Bloch vec- 
tors n and — n, and the binary signal alphabet (yet to 
be optimized) is associated with fixed prior probabilities 
1 — t and t {t ^ 0, 1) for its letters. Let a and b denote the 
respective Bloch vectors associated with the signal alpha- 
bet, and let c = (1 — t)a + tb. The effect of the channel 
on these Bloch vectors is to transform them according 
to some affine transformation ||l^: a ^ a' ~ Ma + e, 
b ^ b' — Mb + e, etc., where M is a real 3x3 matrix 
and e is a fixed vector within the Bloch sphere. With 
these notations, the mutual information J between input 
and output for this ensemble is 

J = -h{c' • n) + (1 - t)h{a' -n) +t h{b' ■ n) 
= ~h{c • n + lu) + (1 — t)h{a • n + w) + th{b ■ h + w), 

where n — M^n and w — e-n. A necessary condition on 
any candidates a and b for optimizing this mutual infor- 
mation is that it be invariant to first order with respect 
to small variations about these vectors. Taking into ac- 
count the constraint that the inputs be pure states, this 
leads to the following two variational equations: 



= log 
= log 
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(1 + w + c ■ nj (1 — w — a ■ 

[1 — w — c ■ nj {1 + w + b ■ nj 
(l + w + c-n)(l — w — b-n] 



rib , 



(16) 
(17) 



where is the zero vector, Ua — n — (n • a)a, and rib — 
n~ {fi-b^b. It is easy to check that the only solutions to 
these occur when either a = b, n ■ a — n ■ b, or a — —6. 
In the first two cases the mutual information vanishes; in 
the last case, it is maximal. This proves the point. 

In summary, I have shown that, contrary to some in- 
tuition, there exist noisy quantum channels for which 



nonorthogonal input states lead to the largest reliable in- 
formation transfer rate. In the particular example here, 
and indeed for all possible channels on Ti.2 , collective mea- 
surements appear to play a crucial role in bringing about 
the effect whenever it exists. However, it remains an 
open question whether collective measurements follow- 
ing product-state inputs is the one and only ingredient 
required for bringing about the optimality of nonorthog- 
onal inputs: for instance, it is not known whether there 
exists a channel on Tid, d > 3, for which the capacity Ci 
is only attained for a nonorthogonal input alphabet. 

The particular example exhibited here was somewhat 
contrived, being built explicitly to show the desired ef- 
fect. However, since the completion of this work, several 
"real world" channels have been discovered (through nu- 
merical simulation) to require nonorthogonal inputs to 
achieve capacity. In fact, the effect appears to be generic 
for channels of a certain dissipative character — the stan- 
dard amplitude damping channel being one such exam- 
ple. An extended discussion of these channels will appear 
elsewhere |p7| . 
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